## [1] "LK Altenburger Land" "LK Uelzen"           "SK Düsseldorf"

Data and Methods

Raw data on PM2.5 has been provided by the German environmental protection agency (UBN) and WAQI. …

Data on total daily COVID-19 cases and deaths has been provided by the Robert-Koch-Institut. For some analysis, the data has been smoothed using a gaussian loess function with span 0.3 to reduce variation due to delayed reporting over the weekend and especcially on Sunday and Monday. The daily number of new infections has been computed from the daily variations of the total confirmed numbers.

The data is availabe for the following nuts 3 regions.

Analyis 1: Countrywide temporal development of PM2.5 values and COVID-19 cases in Germany

The following figure shows the countrywide average of daily PM2.5 and smoothed COVID-19 new infections. Black vertical dotted lines represent the start of the contact restrictions (March 14th) and shut down (March 17th).

Some studies compare the development of atmospheric parameters and COVID-19 cases over the entire avialable time series. As both air quality and COVID-19 infection rates generally decrease after a shutdown event, at least parts of the identified correlations are likely caused by this external event, especially in regions with a strong influence of local activity on local air quality.

To focus on the corelated development of PM2.5 and COVID-19 infections during the early and exponential growing phase, we restrict the time series to the maximum incubation phase of about 14 days prior the first reported infection and the turning point of the infection dynamics shortly after the absolute maximum which occurs about 14 days after the shutdown data. This sets the date limits from February 15th to April 1st.

For the following figure, smothed daily COVID-19 new infections have been detrended using a quasipoission regression and the time series has been restricted to the afore mentioned time period afterwards.

Cross-corelation on countrywide average

The cross-correlation between PM2.5 and detrended and smoothed daily COVID-19 new infections shows that PM2.5 is both leading up to 6 days and lagging up to 9 days. Given the non-stationary features of the COVID-19 infections, the results must be interpreted with care.

Wavelet coherence analysis on countrywide average

Since the dentrended time series is still rather non-stationary and to get a better idea of the time periods and date ranges related to certain time lags, a wavelet coherence analysis is performed with a loess smoother.

The analysis shows that for the period of 4 to 6 days, the PM2.5 time series is leading arround March 1 with up to 2 or 3 days (arrows towards right upward). The situation changes towards April 1st. Here the COVID-19 cases are taking over the lead, with a time lag about 25% of the period. The relationship is also inverted mainly as a consequence of the sharp increase in PM2.5 values due to a Sahara dust event.

Analyis 2: Dynmiac time warp clustering on countrywide average

The map shows clusters (different colors) with similar development of smoothed daily COVID-19 new infections. Cluster ID and number (n) of nuts 3 regions within each cluster is given in the legend.

## 
##  Precomputing distance matrix...
## 
## Iteration 1: Changes / Distsum =  127 / 47679
## Iteration 2: Changes / Distsum =   34 / 34011
## Iteration 3: Changes / Distsum =   30 / 27779
## Iteration 4: Changes / Distsum =   19 / 19721
## Iteration 5: Changes / Distsum =    1 / 19194
## Iteration 6: Changes / Distsum =    7 / 18724
## Iteration 7: Changes / Distsum =    9 / 17943
## Iteration 8: Changes / Distsum =    2 / 17917
## Iteration 9: Changes / Distsum =    0 / 17917
## 
##  Elapsed time is 0.33 seconds.

The plots show the average of daily PM2.5 and smoothed new infections along with their detrened infection series within each cluster. Cluster ID and number (n) of nuts 3 regions within each cluster is given in the figure headers.

Cross-corelation on cluster averages

Cross-correlations between PM2.5 and detrended daily COVID-19 new infections is similar to the countrywide average.

Wavelet coherence analysis on cluster averages

The wavelet coherence analysis for each cluster reveals some considerable differences.

Explanatory power of PM2.5 for COVID-19 cases

Generalized linear model based on nationwide dataset

The following gam model indicates the predictive power of PM2.5 on a nationwide level. An internal 30 fold leave location (i.e. nuts 3 region) out cross-validation has been performed using a generalized linerar model with quasi poission distribution. The date range is again restricted to February 15th to April 1st.

## 
## Family: quasipoisson 
## Link function: log 
## 
## Formula:
## .outcome ~ s(pm25_mean)
## 
## Parametric coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.42224    0.04312   32.98   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##                edf Ref.df     F p-value    
## s(pm25_mean) 8.775  8.984 35.61  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =  0.0569   Deviance explained = 13.8%
## GCV = 13.508  Scale est. = 37.638    n = 5969

Generalized linear based on cluster averages

As for the nationwide case, gam models have been trained to estimate the predictive power of PM2.5 within the individual German clusters. The estimates are based on a 10-fold leave location out cross-validation.

## [[1]]
## Generalized Additive Model using Splines 
## 
## 5969 samples
##    1 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 611, 564, 611, 658, 564 
## Resampling results across tuning parameters:
## 
##   select  RMSE      Rsquared    MAE     
##   FALSE   14.77010  0.04412117  6.021803
##    TRUE   14.77153  0.04383234  6.022389
## 
## Tuning parameter 'method' was held constant at a value of GCV.Cp
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were select = FALSE and method = GCV.Cp.
## 
## [[2]]
## Generalized Additive Model using Splines 
## 
## 5969 samples
##    1 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1363, 1410, 1551, 1457, 1363 
## Resampling results across tuning parameters:
## 
##   select  RMSE      Rsquared    MAE    
##   FALSE   15.86346  0.03931688  5.97335
##    TRUE   15.87005  0.03861111  5.97418
## 
## Tuning parameter 'method' was held constant at a value of GCV.Cp
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were select = FALSE and method = GCV.Cp.
## 
## [[3]]
## Generalized Additive Model using Splines 
## 
## 5969 samples
##    1 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1316, 1363, 1457, 1363, 1269 
## Resampling results across tuning parameters:
## 
##   select  RMSE      Rsquared    MAE     
##   FALSE   15.72147  0.03972646  5.983714
##    TRUE   15.72278  0.04004774  5.978103
## 
## Tuning parameter 'method' was held constant at a value of GCV.Cp
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were select = FALSE and method = GCV.Cp.
## 
## [[4]]
## Generalized Additive Model using Splines 
## 
## 5969 samples
##    1 predictor
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1316, 1410, 1504, 1410, 1316 
## Resampling results across tuning parameters:
## 
##   select  RMSE      Rsquared    MAE     
##   FALSE   15.79064  0.03921827  6.001699
##    TRUE   15.79744  0.03851950  6.002261
## 
## Tuning parameter 'method' was held constant at a value of GCV.Cp
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were select = FALSE and method = GCV.Cp.